Search CORE

126 research outputs found

Asymptotic Theory for Random Forests

Author: Wager Stefan
Publication venue
Publication date: 03/05/2016
Field of study

Random forests have proven to be reliable predictive algorithms in many application areas. Not much is known, however, about the statistical properties of random forests. Several authors have established conditions under which their predictions are consistent, but these results do not provide practical estimates of random forest errors. In this paper, we analyze a random forest model based on subsampling, and show that random forest predictions are asymptotically normal provided that the subsample size s scales as s(n)/n = o(log(n)^{-d}), where n is the number of training examples and d is the number of features. Moreover, we show that the asymptotic variance can consistently be estimated using an infinitesimal jackknife for bagged ensembles recently proposed by Efron (2014). In other words, our results let us both characterize and estimate the error-distribution of random forest predictions, thus taking a step towards making random forests tools for statistical inference instead of just black-box predictive algorithms.Comment: This manuscript is superseded by "Estimation and Inference of Heterogeneous Treatment Effects using Random Forests" by Wager and Athey (arXiv:1510.04342). The new paper extends the asymptotic theory developed here, and applies it to causal inference in the potential outcomes framework with unconfoundedness. The present version is maintained online for archival purposes onl

arXiv.org e-Print Archive

The Efficiency of Density Deconvolution

Author: Wager Stefan
Publication venue
Publication date: 03/07/2015
Field of study

The density deconvolution problem involves recovering a target density g from a sample that has been corrupted by noise. From the perspective of Le Cam's local asymptotic normality theory, we show that non-parametric density deconvolution with Gaussian noise behaves similarly to a low-dimensional parametric problem that can easily be solved by maximum likelihood. This framework allows us to give a simple account of the statistical efficiency of density deconvolution and to concisely describe the effect of Gaussian noise on our ability to estimate g, all while relying on classical maximum likelihood theory instead of the kernel estimators typically used to study density deconvolution

arXiv.org e-Print Archive

Subsampling Extremes: From Block Maxima to Smooth Tail Estimation

Author: Wager Stefan
Publication venue
Publication date: 19/10/2014
Field of study

We study a new estimator for the tail index of a distribution in the Frechet domain of attraction that arises naturally by computing subsample maxima. This estimator is equivalent to taking a U-statistic over a Hill estimator with two order statistics. The estimator presents multiple advantages over the Hill estimator. In particular, it has asymptotically smooth sample paths as a function of the threshold k, making it considerably more stable than the Hill estimator. The estimator also admits a simple and intuitive threshold selection rule that does not require fitting a second-order model. Journal of Multivariate Analysis, 130, 2014Comment: Added reference

arXiv.org e-Print Archive

Quasi-Oracle Estimation of Heterogeneous Treatment Effects

Author: Nie Xinkun
Wager Stefan
Publication venue
Publication date: 06/08/2020
Field of study

Flexible estimation of heterogeneous treatment effects lies at the heart of many statistical challenges, such as personalized medicine and optimal resource allocation. In this paper, we develop a general class of two-step algorithms for heterogeneous treatment effect estimation in observational studies. We first estimate marginal effects and treatment propensities in order to form an objective function that isolates the causal component of the signal. Then, we optimize this data-adaptive objective function. Our approach has several advantages over existing methods. From a practical perspective, our method is flexible and easy to use: In both steps, we can use any loss-minimization method, e.g., penalized regression, deep neural networks, or boosting; moreover, these methods can be fine-tuned by cross validation. Meanwhile, in the case of penalized kernel regression, we show that our method has a quasi-oracle property: Even if the pilot estimates for marginal effects and treatment propensities are not particularly accurate, we achieve the same error bounds as an oracle who has a priori knowledge of these two nuisance components. We implement variants of our approach based on penalized regression, kernel ridge regression, and boosting in a variety of simulation setups, and find promising performance relative to existing baselines.Comment: Biometrika, forthcomin

arXiv.org e-Print Archive

Experimenting in Equilibrium

Author: Wager Stefan
Xu Kuang
Publication venue
Publication date: 30/06/2020
Field of study

Classical approaches to experimental design assume that intervening on one unit does not affect other units. There are many important settings, however, where this non-interference assumption does not hold, as when running experiments on supply-side incentives on a ride-sharing platform or subsidies in an energy marketplace. In this paper, we introduce a new approach to experimental design in large-scale stochastic systems with considerable cross-unit interference, under an assumption that the interference is structured enough that it can be captured via mean-field modeling. Our approach enables us to accurately estimate the effect of small changes to system parameters by combining unobstrusive randomization with lightweight modeling, all while remaining in equilibrium. We can then use these estimates to optimize the system by gradient descent. Concretely, we focus on the problem of a platform that seeks to optimize supply-side payments p in a centralized marketplace where different suppliers interact via their effects on the overall supply-demand equilibrium, and show that our approach enables the platform to optimize p in large systems using vanishingly small perturbations.Comment: Forthcoming in Management Scienc

arXiv.org e-Print Archive

Adaptive Concentration of Regression Trees, with Application to Random Forests

Author: Wager Stefan
Walther Guenther
Publication venue
Publication date: 30/04/2016
Field of study

We study the convergence of the predictive surface of regression trees and forests. To support our analysis we introduce a notion of adaptive concentration for regression trees. This approach breaks tree training into a model selection phase in which we pick the tree splits, followed by a model fitting phase where we find the best regression model consistent with these splits. We then show that the fitted regression tree concentrates around the optimal predictor with the same splits: as d and n get large, the discrepancy is with high probability bounded on the order of sqrt(log(d) log(n)/k) uniformly over the whole regression surface, where d is the dimension of the feature space, n is the number of training examples, and k is the minimum leaf size for each tree. We also provide rate-matching lower bounds for this adaptive concentration statement. From a practical perspective, our result enables us to prove consistency results for adaptively grown forests in high dimensions, and to carry out valid post-selection inference in the sense of Berk et al. [2013] for subgroups defined by tree leaves

arXiv.org e-Print Archive

Confidence Intervals for Nonparametric Empirical Bayes Analysis

Author: Ignatiadis Nikolaos
Wager Stefan
Publication venue
Publication date: 15/02/2021
Field of study

In an empirical Bayes analysis, we use data from repeated sampling to imitate inferences made by an oracle Bayesian with extensive knowledge of the data-generating distribution. Existing results provide a comprehensive characterization of when and why empirical Bayes point estimates accurately recover oracle Bayes behavior. In this paper, we develop flexible and practical confidence intervals that provide asymptotic frequentist coverage of empirical Bayes estimands, such as the posterior mean or the local false sign rate. The coverage statements hold even when the estimands are only partially identified or when empirical Bayes point estimates converge very slowly

arXiv.org e-Print Archive

High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification

Author: Dobriban Edgar
Wager Stefan
Publication venue
Publication date: 04/11/2015
Field of study

We provide a unified analysis of the predictive risk of ridge regression and regularized discriminant analysis in a dense random effects model. We work in a high-dimensional asymptotic regime where

p, n \to \infty

and

p/n \to \gamma \in (0, \, \infty)

, and allow for arbitrary covariance among the features. For both methods, we provide an explicit and efficiently computable expression for the limiting predictive risk, which depends only on the spectrum of the feature-covariance matrix, the signal strength, and the aspect ratio

\gamma

. Especially in the case of regularized discriminant analysis, we find that predictive accuracy has a nuanced dependence on the eigenvalue distribution of the covariance matrix, suggesting that analyses based on the operator norm of the covariance matrix may not be sharp. Our results also uncover several qualitative insights about both methods: for example, with ridge regression, there is an exact inverse relation between the limiting predictive risk and the limiting estimation risk given a fixed signal strength. Our analysis builds on recent advances in random matrix theory.Comment: Added a section on prediction versus estimation for ridge regression. Rewrote introduction. Other results unchange

arXiv.org e-Print Archive

Semiparametric Exponential Families for Heavy-Tailed Data

Author: Fithian William
Wager Stefan
Publication venue
Publication date: 19/10/2014
Field of study

We propose a semiparametric method for fitting the tail of a heavy-tailed population given a relatively small sample from that population and a larger sample from a related background population. We model the tail of the small sample as an exponential tilt of the better-observed large-sample tail, using a robust sufficient statistic motivated by extreme value theory. In particular, our method induces an estimator of the small-population mean, and we give theoretical and empirical evidence that this estimator outperforms methods that do not use the background sample. We demonstrate substantial efficiency gains over competing methods in simulation and on data from a large controlled experiment conducted by Facebook.Comment: To appear in Biometrik

arXiv.org e-Print Archive

Estimation and Inference of Heterogeneous Treatment Effects using Random Forests

Author: Athey Susan
Wager Stefan
Publication venue
Publication date: 09/07/2017
Field of study

Many scientific and engineering challenges -- ranging from personalized medicine to customized marketing recommendations -- require an understanding of treatment effect heterogeneity. In this paper, we develop a non-parametric causal forest for estimating heterogeneous treatment effects that extends Breiman's widely used random forest algorithm. In the potential outcomes framework with unconfoundedness, we show that causal forests are pointwise consistent for the true treatment effect, and have an asymptotically Gaussian and centered sampling distribution. We also discuss a practical method for constructing asymptotic confidence intervals for the true treatment effect that are centered at the causal forest estimates. Our theoretical results rely on a generic Gaussian theory for a large family of random forest algorithms. To our knowledge, this is the first set of results that allows any type of random forest, including classification and regression forests, to be used for provably valid statistical inference. In experiments, we find causal forests to be substantially more powerful than classical methods based on nearest-neighbor matching, especially in the presence of irrelevant covariates.Comment: To appear in the Journal of the American Statistical Association. Part of the results developed in this paper were made available as an earlier technical report "Asymptotic Theory for Random Forests", available at (arXiv:1405.0352

arXiv.org e-Print Archive